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Summary 


A  method  of  estimation  for  the  parameters  of  the  multivariate 
normal  distribution  based  on  the  characteristic  function  '’density)  and 
its  sample  counterpart  is  given.  These  M-estimators  are  dependent  on  a 
user-specified  parameter.  The  response  of  the  parameter  estimates  and 
observation  weights  to  variation  of  this  user-specified  parameter  allows 
a  sensitivity  analysis  of  the  data  and  the  model  considered  as  a  single 
entity.  The  estimators  have  desirable  robustness  properties,  are  easy  to 
compute  and  use,  are  relatively  efficient  at  the  multivariate  normal  and 
are  useful  in  identifying  potential  outliers  and  problems  with  the  statis¬ 
tical  assumptions  or  the  data.  The  method  is  extended  to  inc1  ide  multi¬ 
variate  experimental  designs  with  attention  restricted  to  the  two-way 
cross  classification.  Several  illustrations  are  provided. 

Key  Words:  multivariate  normal,  sensitivity  analysis,  M-estimators, 
parametric  density  estimation,  adaptive  estimation, 
two-way  cross  classification 
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1.  Introduction 

A  large  number  of  robust  procedures  are  available  for  univariate 
normal  data.  The  number  of  procedures  for  the  multivariate  normal 
distribution  are  fewer  in  number.  The  recent  books  by  Huber  (1981), 

Barnett  and  Lewis  (1978),  and  Gnanadesikan  (1977)  provide  excellent 
reviews  and  discussions  of  a  major  portion  of  the  literature. 

The  weighted  and  integrated  distance  between  the  assumed  distri¬ 
bution  function  and  its  empirical  counterpart  has  recently  been  used  by 
Parr  and  Shucany  (1980)  to  produce  attractive  robust  estimators  in  the 
univariate  case.  The  extension  of  such  procedures  to  the  multivariate 
case  will,  however,  present  computational  difficulties  which  may  be  in¬ 
superable.  This  is  not  the  case  with  the  estimators  for  location  and 
covariance  matrix  that  we  propose.  The  univariate  case  was  considered  by 
Paulson  and  Nicklin  (1981).  Simultaneous  estimators  for  location  and  the 
covariance  matrix  are  determined  from  consideration  of  the  sum  of  integrated 
residuals  squared  where  the  residuals  are  defined  as  the  difference  between 
a  Gaussian  characteristic  function  and  its  empirical  counterpart  .  This 
sum  is  equivalent  to  the  sum  of  integrated  squared  differences  between  a 
Gaussian  density  and  its  empirical  counterpart.  The  approach  that  we 
propose  is  equivalent  to  one  of  parametric  density  estimation,  but  is  quite 
different  in  spirit  from  the  work  of  Parzen  (1962). 

The  estimators  we  develop  depend  on  a  user  chosen  parameter  X  (or 
parameters  A^).  Variation  of  this  parameter  from  large  values  (roost 
efficient)  to  successively  smaller  values  allows  the  user  to  determine 
the  response  in  the  parameter  values  to  this  variation.  If  the  data 


x1,x2,...,xn  and  the  Gaussian  model  are  internally  consistent,  then  a 
flat  response  surface  will  result.  If  the  model  and  data  are  not  internally 
consistent,  then  the  response  surface  will  not  be  flat  and  this  will  signify 
potential  difficulties  in  the  data  or  with  the  Gaussian  model  or  both.. 

For  each  observation  x.  a  weight  v. .  is  determined.  The  variation  in  the 

J  J  A 

v.^  as  a  function  of  X  is  useful  in  determining  the  weakest  interfaces 
between  the  data  and  the  Gaussian  model.  Accordingly,  an  examination  of 
the  response  surface  of  the  v. ..  as  a  function  of  X  is  useful  in  identifying 
potential  outliers.  We  envision  that  the  primary  use  for  our  procedure 
will  be  in  performing  sensitivity  analyses.  Secondarily,  a  robust  pro¬ 
cedure  results  if  X  is  chosen  as  fixed  in  the  range  -  h  <  X  <  <*>.  This 
resultant  robust  procedure  is  of  a  somewhat  different  character  from 
those  discussed  by  Maronna  (1976)  and  Devlin  et  al.  (1975). 

Detailed  statistical  properties  of  the  modified  integrated  weighted 
squared  error  are  given.  An  extension  of  the  procedure  to  examine  uni¬ 
variate  problems  and  to  the  analysis  of  a  two-way  layout  of  multivariate 
data  is  also  given.  It  is  indicated  that  the  procedure  can  be  used  in  a 
clutfv  ->ing  context.  Several  examples  are  provided. 


2.  The  Estimation  Procedure 


The  x  minimum  procedure  consists  of  determining  estimators  of  a 


set  of  parameters  9.  ,8-,. . .  ,9  by  minimizing 

X  4  S 


2  S 

<  =  L 


i=l  ^i 


(2.1) 


with  respect  to  the  9's.  The  *  p^^O  1,  82,...,8g)  and  vi  represents 
the  number  of  observations  which  fall  in  cell  i  where  the  k  cells  con¬ 
stitute  a  mutually  exclusive  and  exhaustive  partitioning  of  the  sample 


space  of  the  random  sample  x^x^...^  from  a  density  f(x;0lt02.. .  ,0  ). 

2 

The  modified  x  minimum  procedure  consists  of  determining  estimators  of 
®1*®2***.*®s  from  the  system 

k  (v  -np.)  3p. 

I  — =  0.  (2.2) 

i=l  Pi  36j 

namely  the  equations  which  result  from  differentiation  of  (2.1)  with 
respect  to  0  while  regarding  the  denominator  as  constant  (Cramer,  1946, 
pp.  424-428).  The  method  we  propose  for  the  multivariate  Gaussian  is 
entirely  analogous. 

Let  x1,x2,. . . ,xq  be  a  random  sample  of  size  n  from  the  p-dimensional 
Gaussian  distribution  with  pxl  mean  vector  p  and  p*p  covariance  vector 
V.  The  density  is 

I V I  T  1 

f(x)  =  f(x;p,V)  =  —  -V  exp(-  h  (x-p)  V  1  (x-p))  (2.3) 

(2tt)^P 

and  corresponding  characteristic  function 

<Hu)  =  <|>(u;p,V)  =  exp(ipTu  -  \  uTVu)  (2.4) 

T 

where  u  =  (u^,u2 , . . . .u^)  is  a  l*p  vector  of  real  numbers.  We  shall 
generally  suppress  the  arguments  p  and  V  of  f(x)  and  f(u).  Define 


where  u>(«)  is  a  function  to  be  chosen.  The  quantity  b  represents  a  sum 
of  integrated  weighted  squared  residuals.  We  shall  designate  the  solutions 
for  p  and  V  determined  from  the  system 


2  Re  I  f  3ffiU*  (♦(u)  “  expiux.  )*|o)C<(i(u))|2  du  =  0,  (2.6) 

1  =  1  J 

P 

2  Re  £  (  3flU^  ($(u)  -  expiux.)*|u($(u))|2  du  =  0,  (2.7) 

j=l  JRp  3 

as  modified  integrated  weighted  squared  error  estimators.  The  dimension 

of  the  0's  in  (2.6)  and  (2.7)  are  defined  from  context.  The  notation 

denotes  complex  conjugate.  The  equations  (2.6)  and  (2.7)  are  completely 

2 

analogous  to  the  modified  x  minimum  equations  of  (2.3).  The  estimators 
of  y  and  V  determined  from  (2.6)  and  (2.7)  are  M- estimators  and  are 
affine  invariant.  If  w(ij>(u))  is  itself  a  Fourier  transform  with  inverse 
f^(x),  then  (2.6)  and  (2.7)  admit  of  a  representation  in  terms  of  densities. 
By  Parseval's  theorem  (2.6)  and  (2.7)  have  equivalent  expressions, 
respectively 


*  fJx)Xf(x)  $  fjx)  -  fJx-Xj))  dx  = 


0, 


(2.8) 


2(2ir)P  l  f  (-^IM  f  (x))(f(x)  ft  f  (x)  - 

j=l  i 


fu(x-Xj ))  dx  =  0, 


(2.9) 


where  the  symbol  £  represents  convolution  of  the  density  f(x)  with  the 

function  f  (x).  It  is  important  to  note  that  ttt  (f(x)  *  f  (x))  t 
U)  oV  0) 

C x)  x 

— *  fw(x).  Expressions  (2.8)  and  (2.9)  show  that  u»( 4»(u) )  = 

T 

exp(-  h  u  (XV)u)  provides  an  attractive  choice  since  then  f  (x)  is  a 

0) 

Gaussian  density  with  mean  vector  0  and  variance- covariance  matrix  XV, 

X  a  constant  scalar.  The  convolution  of  a  Gaussian  density  with  mean  u 
and  variance-covariance  matrix  V  with  a  Gaussian  density  with  mean  0  and 
variance-covariance  matrix  XV  is  again  Gaussian  but  now  with  mean  p  and 


i 
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variance- covariance  matrix  (1+X)V.  This  choice  for  oj(<(i(u))  leads  to 
attractive  computational  properties  as  well  as  some  attractive  statistical 
properties . 

Other  attractive  choices  of  u)(<j>(u))  are  easily  found,  for  example, 
the  choice  <u( 4>(u) )=(  1—  | <^(u>  | "^ )  ^  effectively  makes  (2.6)  and  (2.7)  an 
approximate  modified  integrated  correlated  x  minimum  procedure  since,  for 
every  fixed  u,  £  (<f(u)  -  expiux. )  is  asymptotically  complex  Gaussian.  This 


choice  leads  to  more  complicated  numerical  algorithms  but  more  efficiency. 

The  most  important  aspect  of  equations  (2.6)  -  (2.9)  is  that  they 
show  that  the  characteristic  function  procedure  we  are  suggesting  is 
generally  equivalent  to  a  multivariate  parametric 


density  estimation  procedure  since  n  1  J  f  (x-x.)  in  (2.8)  and  (2.9),  is 

j=l  J 

an  unbiased  kernel  density  estimator  for  f(x)  *  f  (x).  Thus  the  estima¬ 
tion  procedure  involves  a  reconstruction  of  the  smoothed-by- f  (x)  error 

(0 

density.  The  density  f(x)  is  assumed  a  priori  while  the  estimate 

n  1  I  f  (x-x.)  is  a  kind  of  posterior  estimate  based  on  the  kernel  f  (x) 
u  to  3  w 

and  the  data. 

We  shall  be  primarily  concerned  with  the  choice 


n 


to(<|>(u))  =  exp(-  h  uT  (XV)u) 

in  this  and  the  next  section.  It  may  be  helpful  to  observe  that 
differentiation  of 


(2.10) 


(2.11) 


with  respect  to  p  and  V  with  subsequent  setting  of  D  =  XV  produces  equations 
(2.6)  and  (2.7)  under  the  choice  (2.10)  of  oo( • ) .  The  use  of  this  observation 
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although  unnecessary,  allows  for  a  somewhat  simpler  derivation  of  the 
estimators;  see  Paulson  and  Nicklin  (1981)  for  the  univariate  version. 
The  integral  A^  may  be  explicitly  integrated  to  give 

AD  =  I  I  $11^1 2  exp(-uTDu)du 
RP 

-  — IT  I  exp{-  h  (x.-y)T(V+2D)-1(x.-y)} 

n  |V+2DP  j=l  3  3 

& 

+ - rr  • 

V+Dp 


We  thus  find  (Dwyer,  1967) 


3A 

3W 


Jjp  n 

l  [  (V+2D)*1  (x.-y )  exp(-  h  Q.(D))  =  0, 

V+2Dp  j=l  3  3 


2  (2ir) 

n 


where 


Qj(D)  =  (xj-y)T(V+2D)“1  (x^-y). 


C2.12) 


(2.13) 


We  use  the  right  hand  side  of  (2.12)  to  generate  an  estimating  equation 
by  setting  D  =  XV;  then  the  estimator  for  the  mean  satisfies  the  implicit 
equation 


l  x,  exp(-  Qj(kV)) 

J  J 

y  =  2l± - .  (2.14a) 

n 

l  exp(-  h  Qj(XV)) 
j=l  3 


The  simultaneous  implicit  matrix  equation  for  the  covariance  matrix 
V  is  determined  from  (Dwyer,  1967) 
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ff  =  ^  —  X  I  exp(-  is  Q,(D)) 


^5P  iA 

^  — Tr)~W  (V+2D)"1  l  (x.-w)(x.-u)T(V+2D)~1exp(-  %  Q.(D)) 
n  |  V+2D  |  **  j=l  :  ^  1 


|  g  I  (2V+2D)"1  =  0. 

1 2V+2D  | 35  j=l 


(2.15) 


On  pre-  and  post-multiplying  (2.15)  by  (V+2D)  and  then  setting  D  =  XV 
we  obtain 


l  (V  exp(-  ^  Q.  (XV)  -  (x.-y)(x.-y)T  exp(-  h  Q.(XV)) 

J  ATZA  J  J  J 


1+21  \  is(p+2) 


-  (— ) 
\2+21/ 


V}  =  0 


(2.16) 


whence  V  satisfies,  in  conjunction  with  (2.14a),  the  implicit  equation 


n  _ 

l  (Xj-uKXj-y)1  exp(-  Q^(1V)) 

v  =  wa>y  <2-1‘,b) 

£  [s,ip(-  H  Vxv)  '  (wi)  ] 

The  implicit  relationships  given  in  (2.14)  suggest  that  the 
estimators  u  and  V  be  computed  via  a  fixed  point  algorithm  with 
x  s  n  I  xj  S  =  n  £  (Xj-x)(Xj-R)^  supplying  the  initial  guesses 
wq  Vq  respectively.  Substitute  these  initial  guesses  into  the  right 
hand  side  of  (2.14).  New  estimates  y^  and  are  obtained  from  the 
left  hand  side.  The  new  estimates  y^  and  are  now  substituted  into 
the  right  hand  side  of  (Z.  14)  and  the  process  is  continued  until  some 
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pre-specified  absolute  or  relative  tolerance  between  successive  estimates 
is  satisfied.  We  have  found  this  fixed  point  algorithm  to  be  effective 
in  a  variety  of  practical  problems  and  computer  simulations.  We  have 
not  found  it  necessary  to  use  second  order  search  methods.  This  is  a 
real  advantage  when  the  dimension  p  is  large.  The  question  of  the  values 
of  X  which  will  be  useful  in  practice  is  addressed  in  the  sequel. 


3.  Properties  of  the  Estimators 

The  estimators  for  y  and  V  given  by  (2.10),  say  y  and  V,  are 

M-estimators  as  is  evident  from  the  estimating  equations.  They  are 

explicitly  dependent  on  the  user-chosen  parameter  X,  even  though  this 

dependence  on  X  will  generally  be  suppressed  for  convenience. 

The  estimators  y  and  V  are  well-defined  for  X  >  -  Modification 

of  the  arguments  of  Bryant  and  Paulson  (1979)  shows  that  these  estimators 

are  consistent  for  y  and  V  for  X  >  -  \  when  the  x..  constitute  a  random 

sample  from  N  (y,V).  It  is  obvious  from  (2.14)  that  lim  y  =  y  =  n  1 

X-h=° 

£  Xj  =  x,  the  usual  method  of  moments  or  maximum  likelihood  estimator  of 
y.  If  y^,  y2>  ...»  yn  is  a  random  sample  from  N^y^,^),  and  for  any 

nonsingular  matrix  A,  x.  =a+Ay.,  j  =  1,  2,  ...,n,  then  y_  =  a  +  Ay 

J  i  x  } 

and  V  =  A  V  AT. 
x  y 

The  asymptotic  variance-covariance  matrices  of  the  estimators  y 
and  V  depends  on  the  user-specified  parameter  X.  From  (2.12)  we  find 
that  the  score  function  for  y  is 


=  (Xj-y)  exp(-  h  Q(D)) 


(3.1) 


D=XV 
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T  -1 

where  Q(D)  =  (x-u)  (V+2D)  (x-y).  By  a  standard  Taylor's  series  expan- 

sion  the  quality  n^y-y)  is  asymptotically  p- variate  normal  HpCO,^) 
where,  for  easily  computed  expectations. 


cov<i)  =  *  l*0r  «v.T> 


D=XV 


=  {-  (l+c)-5a(p+2)  I>'1{(l+2c)"l5(p+2)V}{-(l+c)"l5(p+2)  I}-1 

/1+2c+c2\  /  4+8X+4X2  \  ^^p+2^ 

=  (liTc  )  V  =  4+8— A-2)  V,  (3.2) 

'  20  '  '3+8X+4X^/ 

where  c  =  (1+2A)”1  and  I  is  the  pxp  identity  matrix.  The  asymptotic 
efficiency  of  y  relative  to  y  is  the  ratio  of  determinants  of  covariance 
matrices,  namely 

e(il)  =  /3±8Ai_iii2j  p<p+23  (3.3) 

'  4+8X+  U,\2’ 

Table  1  gives  a  short  listing  of  the  pth  root  of  these  efficiencies. 


Table  1 

pth  Root  of  Asymptotic  Relative  Efficiency  of  y 

X 


0 

1 

2 

4 

OB 

1 

.65 

.91 

.96 

.99 

1 

2 

.56 

.88 

.95 

.98 

1 

3 

.49 

.85 

.93 

.98 

1 

4 

.42 

.82 

.92 

.97 

1 

8 

.24 

.72 

.87 

.95 

1 

8 
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The  efficiency  of  y  declines  rapidly  with  increasing  dimension 
for  low  values  of  A.  Efficiency  of  y  increases  with  increasing  A  as 
y  becomes  increasingly  like  X. 

The  asymptotic  variance-covariance  matrix  of  the  *sp(p+l)  esti¬ 
mators  v^»  of  $  is  considerably  more  difficult  to  obtain.  The 
covariances  covCv^, v^ffl)  can  be  determined  from  the  score  functions 

determined  from  (2.12).  For  example,  from  (2.16)  we  find  that  the 
score  function  for  v^  is 

sv  =  vkk  exp^“  35  “  c^-y^)2  exP(-  ^5  Q(AV)) 
vkk 


/1+2A\  h(p+2) 
"  \ 2+2A/ 


kk 


v-1 


where  x^  is  the  kth  component  of  the  vector  x  and  c=(l+2A)  .  The 

asymptotic  variance  of  v^  is 

uferHk«- 


Straightforward,  but  tedious,  computations  gives  the  asymptotic 
efficiency  of  v^  relative  to  the  maximum  likelihood  estimator  v^  as 


kk 


/  1  \2 

/1+2A\  p+4 

U*2aA_ 

\ 2+2A/ 

2  6+8A+4A2  _  /1+2A  \  P+2 


(3.4) 


\3+2A 


2  ~  V  2+2X  / 


(3+2A)4 

Selected  values  of  these  efficiencies  are  given  in  Table  2.  The 
efficiencies  of  v^  are  about  the  same  magnitude  as  those  for  v^.  It 
would  be  desirable  to  have  the  efficiencies  of  the  matrix  estimator 
0  relative  to  V,  the  maximum  likelihood  estimator,  but  these  values 
would  be  troublesome  to  obtain  and  would  not  be  much  different  from 


(3.4). 
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Table  2 

Asymptotic  Efficiencies  of  the  Estimators 

X 


.5 

1 

2 

4 

00 

1 

.78 

.87 

.94 

.98 

1 

2 

.68 

.77 

.84 

.88 

.90 

3 

.59 

.69 

.76 

.80 

CD 

ro 

4 

.52 

.62 

.69 

.73 

.75 

5 

.46 

.56 

.63 

.67 

.69 

In  the  one  dimensional  case,  e(v^)  tends  to  unity  with  increasing 

X.  In  the  p>l  dimensional  case  the  efficiencies  e(v^)  are  bounded 

away  from  unity.  The  estimators  V  thus  have  a  different  character  in 

the  cases  p=l  and  p>l.  The  efficiencies  of  for  fixed  X  decline 

rapidly  with  increasing  dimensionality.  This  implies  that  the  higher 

the  dimensionality,  the  larger  the  values  of  X  one  should  like  to  use 

if  efficiency  is  a  major  consideration  in  the  choice  of  X. 

The  nature  of  V  as  X-*»>  may  be  determined  by  an  application  of 

L' Hospital's  rule.  Some  elementary  manipulations  yield  the  implicit 

equation  n 

l  (x.-y){x.-n) 

V  =  - -  (3.5) 

n 

h  n(p+2)  -  \  l  (x.-w)TV_1(x.-u) 
j  =  l  3  3 

that  the  estimator  V  must  satisfy  asymptotically  in  X.  When  p=l  (3.6) 
may  be  rearranged  to  give  the  usual  maximum  likelihood  or  moment  esti¬ 
mator  for  V.  But  for  pa2  the  estimator  cannot  be  so  arranged;  thus  the 


efficiencies  in  Table  2  are  bounded  away  from  unity  when  p>l  and  X=». 

The  curious  estimator  of  V  defined  implicitly  in  (3.6)  does  not  seem  to 
be  especially  interesting. 

The  estimators  v^,  v^*  • v2V  ***»  ^  are  asymptotically 

*5  p(p+l)  variate  Gaussian  and  are  asympotically  independent  of  y  which 

is  asymptotically  p-dimensional  Gaussian. 

There  are  a  number  of  measures  of  qualitative  robustness  but 

the  most  important  is  the  influence  function.  Most  of  the  other  measures 

are  derived  from  the  influence  function.  The  influence  function  is 

simply  proportional  to  the  score  function  (Huber,  1981,  p.  45  ).  The 

influence  function  at  the  p- variate  Gaussian  distribution  N  (y ,V)  is 

P 

IC<X!“'“)  ■  |E0)I  1  \ 

=  (l+c)ls(p+2)(x-y)  exp(-  |  (x-y)V1(x-y)).  (3.6) 

This  function  is  bounded  and  redescending  to  zero  for  all  -  \  <  X  <  «>. 

It  also  shows  that  the  assumed  Gaussian  distribution  is  playing  an 
adaptive  role  in  the  estimation  of  y.  Furthermore,  the  component 
of  the  vector  x  plays  a  role  in  the  estimation  of  the  components  y ^  y^, 
y^  of  y  as  well  as  y^.  Figures  1  provide  contours  of  this  influence 
function  for  y  for  several  values  of  X  and  correlation  p  at  the  standard  bivariate 
Gaussian  distribution.  The  forms  of  these  contours  suggest  that  the 
procedure  will  adaptively  cluster  the  observations  assumed  to  follow  a 
Gaussian  parent  according  to  those  which  belong  to  the  parent  and  those 
which  do  not  -  provided  X  is  small  enough. 


D=XV 


11>. 


Influence  function  contours  for  u 
at  the  standard  bivariate  normal. 
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The  influence  function  for  V  at  Np(y,V)  is  more  difficult  to 
obtain,  at  least  computationally.  For  the  j,k  element  of  (2.12),  we 
note  that  the  partial  derivative  of  this  element  with  respect  to  V  is 
a  pxp  matrix  and  hence  differentiation  of  (2.12)  with  respect  to  V  is 
facilitated  by  the  introduction  of  tensors.  We  need  not  go  to  such 
lengths.  The  shape  of  the  influence  function  is  the  important  property 
and  it  is  determined  by  the  score  function  Sy  The  influence  function 
for  V  at  Np(y,V)  has  shape  determined  by 

sv  =  c(x-y)(x-y)T  exp( -  h  Q(XV))  +  (l+c)-1^P+2^  V  -  V  exp(-  h  Q(XV)) 

(3.7) 

This  function  is  bounded  and  re-descending.  It  does  not  redescend  to  zero, 
however,  but  rather  to  a  positive  definite  matrix  constant.  Score  functions 
of  sv  for  A=1  and  several  values  of  correlation  p  at  the  standard  bivariate 

ij 

normal  distribution  are  given  in  Figures  2.  The  contours  are  not  closed 
for  large  values  of  the  arguments  x^  and  Xj  of  the  vector  x.  However,  the 
appearance  of  these  contours  still  suggest  the  possibility  of  clustering 
by  the  estimation  procedure.  Moreover,  the  procedure  can  also  be  used  to 
determine  observations  which  might  be  trimmed  from  the  sample,  if  trimming 
or  peeling  were  for  some  reason  a  desirable  thing  to  do.  The  observations 
with  relatively  low  values  of  the  final  weights  v^  =  exp(-  y  (x^-y)  v  (x^-y)) 
are  those  which  might  be  peeled  off. 

The  estimators  y  and  V  are  not  in  the  class  of  those  considered 
by  Maronna  (1976)  even  though  they  are  similar  in  form.  Devlin, 

Gnanadesikan,  and  Kettenring  (1981)  report  favorable  results  obtained 
with  the  use  of  Maronna-  and  Huber-  type  multivariate  estimators. 


1 


1 


l 


i 


I 


♦vlw- 


pe_2a.  Score  function  for  v  at  the  standard 

bivariate  normal,  p  =0,  X=1  (Azimuth  =  45°, 
Elevation  =  30°). 


Figure  2c.  Score  function  for  v^2  at  the  standard 

bivariate  normal,  p=.9 ,  X=1  (Azimuth  =  45°, 
Elevation  =  30°). 


I 

I 


-15- 


4.  Choice  of  X 


There  are  two  possible  uses  for  the  procedure  we  propose  here. 

The  first  is  to  choose  a  single  value  of  X,  possibly  based  on  efficiency 

considerations,  and  use  it  as  a  robust  procedure.  The  choices  X=1  or 

X=2  provide  high  efficiencies  and  good  robustness  properties.  The 

second  use  is  the  one  for  which  the  procedure  was  developed  and  which 

has  proved  most  useful  in  practical  applications.  We  use  the  procedure 

to  generate  a  sensitivity  analysis.  In  a  practical  exploratory  setting 

we  first  compute  the  maximum  likelihood  estimators  and  use  these  for 

starting  values  in  the  iterative  algorithm.  Next  we  take  a  value  of  X, 

4  or  2, and  examine  the  behavior  of  the  estimates.  Finally,  we  would  take 

X=1  or  h  and  gradually  decrease  it.  The  response  surface  of  the  parameter 

values  and  the  final  weights  as  a  function  of  X  are  of  primary  interest. 

In  the  process  we  determine  estimates  and  final  weights  v.  =  exp(-  \  (1+2X)  1 

3  x 

T  —1 

(x_.-u)  ^  ~  (x^-vD)  associated  with  each  observation.  If  the  estimates  are 
sensitive  to  this  variation  in  X,  then  there  are  problems  associated  with 


either  the  data  or  with  the  Gaussian  error  model  or  both.  The  particular 
observation(s)  which  is  (are)  the  potential  cause  of  the  sensitivity  are 


identified  by  low  values  of  v.  vis-a-vis  the  whole  set  of  these  weights. 

3  X 

As  c  increases  (X  decreases)  observations  a  distance  removed  from  y  receive 


lower  weight.  This  discussion  will  be  subsequently  illustrated  with  an 


example. 

The  derivation  which  led  to  the  estimators  given  in  (2.14)  did  not 
require  that  X  in  (2.14a)  be  the  same  as  in  (2.14b).  We  could  for 
example  use  X=1  for  y  and  X=2  for  V.  Under  such  a  choice  we  would  then 
be  able  to  fix  the  efficiencies  that  might  be  desired  for  both  y  and  V. 
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The  asymptotic  efficiencies  are  appropriate  because  some  evidence  is 
available  that  these  estimators  approach  their  asymptotic  distributions 
very  rapidly. 

Furthermore,  we  need  not  have  restricted  ourselves  to  a  scalar 
value  of  X  in  order  to  arrive  at  (2.14).  At  the  expense  of  greater 
algorithmic  complexity  we  could  have  chosen  values  A„  corresponding 
to  each  v. .  in  the  covariance  matrix  V.  Let  the  pxp  matrix  L  =  (1+2X. .) 
and  the  pxp  matrix  M  =  (2+2A„)  and  let  LxV  =  ( ( 1+2A^ ^  )v^ )  denote  the 
Hadamard  product  of  L  with  V.  Then  by  arguments  similar  to  those 
employed  to  arrive  at  (2.14)  it  may  be  shown  that  the  more  general 
estimators  for  y  and  V  satisfy  the  implicit  relations 


l  x.  exp(-  \  (x.-u)T(LxV)  1(x.-y)) 
=  1  J  J  J 


l  exp(-  %  (x.-u)T(LxV)_1(x.-u)) 
j_l  J  J 


(4.1a) 


and 


LxV 


exp(-  h  (Xj-u)T(LxV)_1(Xj-y) ) 


(LxV)(MxV)-1(LxV) 


+  \  (x.-y)(x.-y)T  exp(-  \  (x.-y)T(LxV)  1(x.-p))J  .  (4.1b) 

j=l  J  J  J  J  1 

/ 2+2A .  .  . 

The  identity  MxV  =  ( 1+2 )  x  is  useful  in  computing  (4.1b).  The 

ij 

estimators  of  v„  are  computed  in  a  component-wise  fashion  from  the  final 
iteration  of  (4.1b).  These  estimators  would  be  of  interest  when  it  is 
desired  to  treat  different  components  of  the  x^  differently.  Such  a 
situation  arises  in  the  bounded  influence  regression  problem  (Belsley, 
Kuh,  and  Welsch,  1980)  where  we  may  wish  to  be  relatively  critical  in  our 


r 
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analysis  of  the  dependent  variables  but  less  so  for  the  independent  vari¬ 
ables  since  considerable  information  can  be  associated  with  a  wide  spread 
in  the  independent  variables. 

Specifically,  let  us  partition  the  vectors  x.  as  (y . ,z„ . ,z„ . , . . . ,z  .) 

1  ]  1]  2]  q] 

(yji2j)  where  q  =  p-1.  Let  y_.  be  the  dependent  variable  in  a  regression 

framework  and  let  z .  represent  a  q- vector  of  independent  variables. 

T  T 

Corresponding  to  this  partition  we  have  p  =  (u^.v  )  where  E(y^)  =  u^, 

E(Zj)  =  v.  Further, 


V  = 


vll 

V12  Vlp 

V  V 

11  12 

V21 

V22  V2p 

CN 

CN 

> 

H 

CM 

> 

11 

|, 

• 

• 

v  .  .  .  .V 

P2  PP 

* 

represents  the  corresponding  partition  of  the  covariance  matrix.  The 
regression  of  y..  on  z  ^  is  (Anderson,  1958,  Ch.  2) 

ECy^Zj)  =  P1  +  V12  V~ ^  (z^-v).  (4.2) 

We  may  wish  to  estimate  u1  and  v^  in  a  more  critical  fashion  than  V^2 
and  this  in  turn  in  a  more  critical  fashion  than  v22*  ac^-eve  this 
we  would  take  values  X^  associated  with  v^,  X^2  associated  with  V^2, 
and  X22  associated  with  V22  where  X^  <  X^2  <  X22>  The  matrix  L  would 


thus  be 
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L  = 


11 

X12  * 

.  •  x12 

12 

• 

X22  • 

*  ‘  X22 

• 

12 

X22  * 

•  ‘  X22 

qxi 

vector 

of  ones 

T  ) 


X11  X12- 


X12~  X22-1-1 


sensitivity  analysis  purposes  we  could  consider  first,  say,  X  ,  X12, 
X22,  next  h  X^,  h  X10,  h  X0o,  and  in  general  kX.,.,,  kX.,0,  kX00  for  some 


11’ 


12 


22 


11’  12 ! 


22 


values  of  k.  Points  far  out  and  isolated  in  factor  space  or  points  whose 

response  y^  give  rise  to  large  residuals  will  be  identified  by  the  response  of  the 

-  _  t  -1 

final  observation  weight  v.  =  exp(-  \  (x.-y)  (L*V)  (x.-y)).  For  a  fixed  value 

'  ]  L  3  3 

of  L,  an  estimate  of  the  regression  equation  E(y|z)  is  obtained  by  substituting 
parameter  estimates  y^ ,  V^,  ^  ^"n  Some  of  these  ideas  will 

be  subsequently  illustrated. 


5 .  Examples 

Example  5.1.  The  basic  data  for  this  example  are  taken  from 
Anderson  (1958).  The  first  15  points  consist  of  the  first  two  (of  four) 
components  of  this  data  with  five  additional  (outlying)  observations 
appended.  Me  have  chosen  X=4,  2,  1,  h  for  this  illustration.  Table  3 
provides  the  estimates  of  the  means  and  covariances  as  well  as  the  maximum 
likelihood  estimators.  Table  3  also  summarizes  the  weights 
v^  =  exp(-  j  (x-y)  V  (x-y))  associated  with  each  point  on  the  assump¬ 
tion  that  the  data  follow  a  single  multivariate  Gaussian  distribution. 
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With  X  =  +  <*>,  all  weights  are  the  same.  As  X  decreases  from  +  “>,  the 

weights  become  differentiated.  The  5  outlying  observations  are  rendered 

distinctive  by  their  diminishing  weights  v.  as  X  decreases.  This 

3  ^ 

indicates  that  these  observations  are  not  consistent  with  the  remainder 
of  the  observations  and  the  assumption  of  a  single  Gaussian  distribution. 
This  is  further  highlighted  by  referral  to  equations  (2.8)  and  (2.9). 

At  convergence  of  the  iterative  estimation  procedure  we  find  for  0=u  or 
9=V 

[  af<x)  *  f  (x)  (f(x)*f  (x)  -  n-1  l  f  (x-x.))  dx  =  0  (5.1) 

J  do  U)  Ui  .  0)  ] 

R  :  =  1 


where  f^(x)  is  the  Gaussian  density  with  mean  0  and  covariance  matrix  XV, 
which  implies  that  the  density  f(x)*f^(x)  is  being  estimated  by 

gi(x)  =  n-1  l  | XV |  15  exp(-  ~  (x-x_,)T  V-1  (x-xj). 


3=1 


The  density  f(x)*f  (x)  may  be  regarded  as  a  prior  distribution,  g. (x) 

U  A 

as  a  posterior  density  given  the  x^ .  Figures  3a,  3b,  3c,  depicts  what 

the  estimation  procedure  perceives  as  X  decreases.  For  large  X  the 

density  estimate  is  approximately  uniform.  At  A=2  the  density 

estimate  contours  are  smooth  except  for  some  slight  distortion  in  the 

area  of  (195,130).  At  X=1  the  probability  surface  is  becoming  distorted 

in  the  vicinity  of  the  contamination  but  the  distortion  is  not  yet  pronounced. 

Compare  the  estimates  of  the  parameters  and  the  observation  weights  v.^. 

At  X=*s  the  distortion  has  become  dramatic  and  indeed  separate  "hills" 

for  the  outlying  points  have  formed.  Again  compare  the  estimates  and  v.. 

3  A 

for  X=is.  Since  the  density  estimate  perceives  the  outlying  observations 


Table  3a 


Sensitivity  of  Observational  Weights 

v.  (xioo)  to  Variation  in  A 
D  " 


A 


X1 

X2 

4 

2 

1 

.5 

1 

179 

145 

36 

39 

45 

50 

2 

201 

152 

33 

36 

23 

16 

3 

185 

149 

37 

41 

47 

55 

4 

188 

149 

37 

40 

44 

49 

5 

171 

142 

34 

35 

39 

39 

6 

192 

152 

37 

39 

44 

48 

7 

190 

149 

37 

39 

41 

42 

8 

189 

152 

37 

40 

47 

53 

9 

197 

159 

35 

36 

39 

36 

10 

187 

151 

37 

41 

47 

55 

11 

186 

148 

37 

40 

45 

50 

12 

174 

147 

35 

37 

38 

36 

13 

185 

152 

37 

41 

46 

50 

14 

195 

157 

36 

38 

42 

43 

15 

187 

158 

35 

36 

30 

20 

16 

161 

130 

27 

23 

17 

8 

17 

183 

158 

34 

34 

22 

10 

18 

173 

148 

35 

36 

33 

27 

19 

182 

146 

37 

40 

45 

50 

20 

165 

137 

31 

30 

30 

25 

21 

185 

152 

37 

41 

46 

50 

22 

178 

147 

36 

39 

45 

49 

23 

176 

143 

36 

38 

42 

45 

24 

200 

158 

34 

35 

37 

36 

25 

187 

150 

37 

41 

47 

54 

26 

200 

130 

18 

6 

0 

0 

27 

200 

135 

23 

11 

0 

0 

28 

165 

160 

24 

13 

1 

0 

29 

195 

170 

28 

22 

6 

1 

30 

220 

170 

23 

18 

13 

6 

Table  3b 

Sensitivity  of  Parameter  Estimates  to  Variation  in  A 


A 


4 

2 

1 

.5 

4, 

185.5 

185.2 

184.9 

184.9 

150.1 

150.3 

149.9 

149.7 

vu 

V22 

P 12 

148.1 

148.8 

155.4 

136.6 

80.2 

74.1 

65.7 

52.3 

.52 

.67 

.85 

.87 
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as  not  consistent  with  the  remainder  of  the  data  and  the  single  Gaussian 
assumption,  the  reason  for  the  down  weighting  of  the  outlying  observations 
has  become  clear.  If  we  let  X-K3+,  the  density  estimator  becomes  a  set  of 
Dirac  delta  functions  located  at  each  point. 

At  \=%,  the  procedure  has  effectively  clustered  the  data  with  the 
outlying  observations  excluded  from  the  main  cluster.  Accordingly,  as  X  is 
varied,  a  dramatic  change  in  the  estimators  implies  the  existence  of 
clusters  of  observations  different  from  the  main  cluster  and  not  consistent 
with  the  prior  assumption  of  strict  Gaussianity.  The  reconstruction  of 
the  error  density  becomes  increasingly  dependent  on  the  data  as  X  decreases 
and  hence  the  procedure  is  increasingly  critical  with  decreasing  X. 

Example  5.2.  If  (y^,  z..),  j  =  l,2,...,n  represents  a  random  sample 
from  N2(u,V),  y  =  (u1,v)  ,  then  the  regression  of  y  on  z  is  given  by 

E(y|z)  =  Uj  +  V12  V22  (z-v)  =  60  +  B1z, 

say,  and  the  0's  may  be  computed  from  the  estimates  of  y  and  V.  The 
data  for  this  example  is  taken  from  Andrews  and  Pregibon  (1978)  who  were 
concerned  with  regression  models.  The  data  are  presented  in  Table  4. 

The  least  squares  or  maximum  likelihood  estimates  are  also  presented  in 
Table  4.  We  shall  use  the  L  matrix  version  of  the  modified  integrated 
squared  error  procedure  as  discussed  in  section  4.  We  wish  to  be  most 
critical  with  respect  to  estimation  of  v^  and  y^,  to  somewhat  less  critical 
with  respect  to  estimation  of  V^2,  and  least  critical  with  respect  to  esti¬ 
mation  of  V22  and  v.  Thus  we  take,  for  any  single  application  such  as 
robost  regression,  X^  <  X12  <  X22<  This  choice  seems  to  be  particularly 
appealing  since,  in  a  regression  framework,  we  wish  to  retain  the  high 


Table  4a 


Sensitivity  of  v^L  (xl00)  to  Variation  in  X^,  X12,  X 22 


(X11’  X12 *  X22) 


y  . . 

X 

(2,4.8) 

(1,2,4) 

(.5,1,2) 

1 

95 

15 

100 

99 

97 

2 

71 

26 

72 

48 

20 

3 

83 

10 

85 

76 

65 

4 

91 

9 

96 

93 

88 

5 

106 

15 

88 

82 

74 

6 

87 

20 

96 

89 

72 

7 

93 

18 

99 

96 

87 

8 

100 

11 

98 

97 

97 

9 

104 

8 

95 

91 

84 

10 

94 

20 

96 

91 

78 

11 

113 

7 

81 

70 

54 

12 

96 

9 

99 

97 

95 

13 

83 

10 

85 

76 

65 

14 

84 

11 

88 

81 

71 

15 

102 

11 

97 

95 

93 

16 

100 

10 

98 

97 

96 

17 

105 

12 

92 

89 

84 

18 

57 

42 

40 

11 

0 

19 

121 

17 

50 

33 

19 

20 

86 

11 

92 

86 

79 

21 

100 

10 

98 

97 

96 

Table  4b 

Sensitivity  of  Estimates  to  Variation  in  X^,  ^2’  X22* 


(2,4,8) 

(1,2,4) 

(.5,1,2) 

”l 

13.4 

12.8 

12.3 

V 

94.6 

95.3 

95.9 

Vu 

45.6 

32.7 

23.6 

V12 

-49 . 2(- . 56) 

-29.2(-.43) 

1 

M- 

-P 

00 

1 

fO 

V22 

168.3 

143.9 

125.1 

The  estimate  of  correlation  is  given  in  parentheses 
along  with  v^ 
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efficiency  associated  with  substantial  spread  in  the  independent  variable. 

If  on  the  other  hand,  we  wish  to  be  aware  of  extreme  values  of  the  inde¬ 
pendent  variables,  a  sensitivity  analysis  may  be  more  appropriate.  We 
first  take  1^=2,  ^2=4’  *22=8‘  71:6  we^Shts  v^L  =  exp(-  \  (LxV)  Hx^-jj)) 


are  presented  in  column  3  of  Table  4.  Next  we  take  X^=l,  X^22,  ^22=4+  anc* 
the  results  are  presented  in  column  4.  The  parameter  estimates  are  sensi¬ 
tive  functions  of  L.  The  points  which  are  most  influential  or  potentially 
inconsistent  vis-a-vis  the  linear  model  with  a  Gaussian  error  structure  are 
determined  from  observational  weights  v.  ,  i.e.  those  with  low  values  of 
v.  .  Three  points,  2,  18,  19,  are  especially  singled  out.  Point  18  repre- 
sents  an  extreme  point  in  the  z-space  and  is  most  influential  on  the  estimate 
of  the  slope  8^  =  v12^v22  regression  line.  Point  2  has  the  second- 

most  extreme  value  of  z.  Point  19  produces  an  extreme  residual.  Analogous 
results  obtain  if  all  X . .  =  X  and  X  is  varied  in  order  to  determine  the 
response  of  the  parameter  estimates  and  the  observational  weights  to 
variation  in  X.  In  many  cases,  not  all,  taking  just  a  single  value  of  X 
(or  set  of  values  X^ )  provides  sufficient  information  concerning  the 
response  of  the  parameter  estimates  and  weights  v..  (or  v.  )  to  variation 
in  X  (or  L)  in  the  sense  that  if  X  (the  X^)  were  further  decreased,  the 
trend  in  response  will  be  continued.  In  these  cases  a  robust  analysis 
will  lead  to  the  same  conclusions  as  the  sensitivity  analysis.  In  some 
cases,  a  change  in  trends  will  be  observed  as  X  decreases. 

Example  5.3.  It  is  possible  to  produce  non-Gaussian  data  for  which 
variation  in  X  does  not  lead  to  dramatic  changes  in  the  parameter  estimates 
or  low  values  of  v\^.  The  three  dimensional  data  for  this  illustration 
are  taken  from  Gnanadesikan  (1977,  pp.  50-52).  The  61  triads  of  his 
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example  7  were  obtained  by  appending  a  standard  normal  deviate  to  each  of 
the  coordinates  on  the  surface  of  a  specified  paraboloid.  The  two- 
dimensional  scatter  plots  of  these  data  are  not  suggestive  of  the  data  in 
three  dimensions  lying  near  a  curved  surface.  We  determine  the  response 
of  the  parameter  estimates  to  changes  in  X.  The  mean  vector  estimate 
at  X=8  is  (-3.54,  4.72,  26.94)  while  at  X=k;  it  is  (-3.53,  4.70,  26.95). 

The  variance  estimates  at  X=8  are  (3.81,  2.33,  2.94)  while  at  X=}5  they 
are  (4.43,  2.76,  3.79);  the  correlation  estimates  at  X=8  are  (-.51,  -.47, 
.20)  while  at  they  are  (-.56,  -.50,  .27).  The  estimates  of  the  mean 
are  remarkably  stable  but  the  estimated  variances  increase  with  a  decrease 
in  X.  These  characteristics  imply  that  if  the  data  or  the  Gaussian  dis¬ 
tribution  model  is  not  appropriate,  the  best  place  to  look  for  difficulties 
is  at  the  centroid.  This  is  confirmed  by  the  distribution  of  the  weights 
v.  ,  especially  for  1=^.  For  example,  for  X=3c,  the  largest  three  weights 

3  A 

v.  ,  .89,  .84,  .82,  which  indicate  that  there  are  no  observations  near 
3  * 

the  centroid.  This  deficiency  of  observations  near  the  centroid  is  also 

determinable  from  X=8  results,  but  it  is  highlighted  at  the  smaller  values 

_  2 

of  X.  A  plot  of  -2(1+2X)  log  v. .on  x  paper  with  3  degrees  of  freedom 

3  A 

further  emphasizes  the  inappropriateness  of  the  Gaussian  assumption. 

6.  Multivariate  Two-Way  Cross  Classification 

The  system  of  equations  (2.6)  and  (2.7)  are  readily  extended  to 
include  multivariate  regression  and  design  situations.  We  indicate  how 
this  may  be  done  for  the  case  of  a  two-way  cross  classified  design.  The 
arguments  are  similar  for  other  designs  and  regression  problems. 
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The  two-way  cross  classified  model  may  be  written 


E<xjki)  =  “  *  “j  *  sk 


(6.1) 


j  =  1,2,. ...a,  k  =  1,2,. ..,b,  l  =  1,2, ••  •  »njk»  njk  >  1.  The  xjk£ 
are  assumed  to  be  p-dimensional  Gaussian  with  covariance  matrix  V.  The 
quantities  u»  a.,  8.  are  p*l  location  vectors.  Define 

3  K 

$.,(u)  =  exp{iuT(p  +  a.  +  8.  )  -  \  uTVu},  (6.2) 

3k  3  K 

f^k(x)  =  |  2ttV | exp{-  h  (x-y-a^.-3k)T  V-1  (x-p-a.-8k)}  ,  (6.3) 


and 


oj( 4>(u) )  =  exp(  -  |  uTVu)  ,  (6.4) 

f  (x)  =  |  2itAV  I  ^  exp(-  \  x^(AV)  ^x)  .  (6.5) 

u) 

The  system  of  equations  parallel  to  (2.6)  and  (2.7)  are 


a  b  n 


l  l  l 

j=l  k=l  1=1 


jk 


f  3^-v(u) 


^  —  (^(u)  -  exp(iuXjkJl))*  | u»( 4>( u) )  | 2  du  =  0  (6.6) 


or,  equivalently 


a  b  n-;  v  f 

<2*>P  l  l  f  f 

j  =  l  k=l  «,=  !  I 


3fik(x)  x  x 


30 


*  f  (x)(f.,  (x)ftf  (x) 
id  3  k  u> 


fa>(x-XjkJl))dX  =  °» 


P 


(6.7) 


with  arguments  0=y,  a.,  8V,  V,  i  =  l,2,...,a,  k  =  l,2,...,b.  Equation 
(6.6)  may  be  explicitly  evaluated  and  leads  to  the  implicit  equations 


f  l  |  (xjkt  -  »  -  “j  -  V  'jkt.x  = 


(6.13) 
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Observations  which  require  special  consideration  are  indicated,  as 

in  section  5,  by  low  values  of  v.  vis-a-vis  the  whole  set.  A  low  value 

3  9  A 

of  ^  may  mean  that  the  particular  observation  is  a  potential  outlier. 

Too  many  low  values  will  imply  that  the  model  assumption  of  a  single 
Gaussian  parent  may  not  be  warranted  or  that  the  model  is  mis-  specified 
or  that  there  are  indeed  a  number  of  potential  outliers.  In  the  latter  case 
a  goodness-of-fit  test  will  usually  declare  against  the  Gaussian  error 
distribution.  Furthermore,  if  n.,  >  1  and  we  find  that  individual  cells  have 
low  values  ^  associated  with  them,  then  interaction  in  the  table  is  a 

distinct  possibility.  In  this  case  we  generalize  the  model  to 


E(xj]a)  =  y  +  a.  +  8k  t  Yjk. 
and  proceed  accordingly. 

This  multivariate  procedure  can  be  especially  useful  for  exploratory 
purposes.  Determination  of  the  sensitivity  of  v^k^  ^  and  the  parameter 
estimates  to  changes  in  X  will  serve  to  uncover  potential  problems  with 
the  data  or  the  model  considered  as  a  unit.  The  procedure  is  computationally 
inexpensive  and  easy  to  use.  This  procedure  does  not  apparently  lend  itself 
to  hypothesis  testing  problems  per  se.  However,  it  could  be  effectively 
used  in  conjunction  with  tests  of  hypotheses.  If  the  sensitivity  analysis 
uncovers  some  difficulty  with  the  data  or  the  model,  then  a  test  of  hypothesis 
may  be  appropriate. 

Example  6.1.  The  data  (Table  5)  for  this  example  were  taken  from 
Anderson  (1958,  p.  218)  who  gives  some  additional  background  concerning 
these  data.  The  first  component  of  the  observation  vector  is  a  barley 
yield  in  a  given  year;  the  second  component  is  the  .ame  measurement  made 
the  following  year. 


LOCATION 


Table  5 


VARIETIES 


1 

2 

3 

4 

5 

81 

105 

120 

110 

98 

81 

82 

80 

87 

84 

147 

142 

151 

192 

146 

100 

116 

112 

148 

108 

82 

77 

78 

131 

90 

103 

105 

117 

140 

130 

120 

121 

124 

141 

125 

99 

62 

96 

126 

76 

99 

89 

69 

89 

104 

66 

50 

97 

62 

80 

87 

77 

79 

102 

96 

68 

67 

67 

92 

94 
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We  fit  the  model  (6.1)  to  this  data  by  the  method  of  maximum 
likelihood  and  by  the  modified  integrated  squared  error  method  for  various 
values  of  X  with  the  objective  of  performing  a  sensitivity  analysis.  The 
results  of  this  analysis  are  summarized  in  Tables  6  and  7.  We  have  only 
given  the  results  for  X=2  since  the  response  of  the  parameter  estimates 
and  the  final  weights  to  decreases  in  X  continues  the  trend  evidenced  in 
Tables  6  and  7.  Table  6  indicates  that  the  largest  change  occurred  in 
the  parameter  a 5  and  the  covariance.  The  correlation  increased  from  .22 
to  .37.  The  final  weights  v.,  are  given  in  Table  7.  Observation  (5,3) 
receives  an  especially  low  weight  while  observations  (1,3),  (3,4),  (5,4), 
and,  to  a  lesser  extent,  (2,4)  also  receive  low  weights.  It  is  likely 
that  (5,3)  is  an  outlier  although  we  have  not  applied  any  tests  of  dis¬ 
cordancy.  We  are  suggesting  that  low  weights  raise  the  suspicion  of 
potential  outliers  or  other  difficulties  with  the  data  and  the  model 
(Gaussianity,  additivity,  etc.).  We  are  not  suggesting  that  this  procedure 
be  used  as  a  formal  test  for  outliers.  These  observations  have  had  the 


effect  of  reducing  the  correlation  between  the  first  and  second  year  yields. 


A  rough  rule  of  how  improbable  the  weights  are  may  be  determined  by 

_  >p-.-l 

the  following  considerations.  The  quadratic  form  (x.  -y)  V  (x.,-y)  is 

)k  Dk 

2 

approximately  x  on  2  degrees  of  freedom.  The  1%  point  of  this  distribution 


is  9.21.  Roughly,  for  X=2,  we  would  expect  weights  less  than 
exp( -  9 . 21/( 2( 1+2X) ) )  =  .4Q  about  1%  of  the  time. 


Additional  reduction  of  X  say  to  1.5  will  produce  a  somewhat  stronger 
version  of  basically  the  same  results.  However,  when  X  is  decreased  to 


unity  the  procedure  starts  to  break  down  in  the  sense  that  the  singled-out 
observations  above  receive  weights  near  0  and  a  few  of  the  other  originally 


T 


Table  6 


Maximum  Likelihood  (ML)  and  Modified  Integrated 
Squared  Error  (X=2)  Parameter  Estimates 


low  weights  have  been  further  reduced.  The  first  component  variance  is 
dramatically  reduced.  The  reason  for  this  is  that  some  of  the  observations 
are  separated  in  such  a  way  that  the  empirical  density  estimator  around 
it  intersects  only  slightly  with  the  empirical  density  estimator  around 
other  observations.  Accordingly,  multiple  densities  are  perceived  by  the 
procedure  and  the  more  isolated  and  less  massive  with  respect  to  the 
model  (6.1)  and  the  Gaussian  assumption  are  basically  excluded  from  the 
estimates  by  virtue  of  their  low  weights. 

When  A=0,  the  density  estimate  f^(x-x^)of  (6.7)  around  each  point 

x.,  becomes  a  Dirac  delta  fraction.  This  fact  does  not  ensure  that  all 
Ilk 

but  a  few  weights  will  tend  to  zero,  however.  If  the  model  (6.1)  is 
truly  appropriate  and  if  the  data  x.^,  i>  1  replicates  are  truly  Gaussian, 
then  the  variances  in  the  covariance  matrix  will  ultimately  begin  to  increase 
with  further  decreases  in  X. 

It  would  be  desirable  to  have  an  estimator  of  V  whose  influence  function 
redescends  to  zero.  Such  an  estimator  is  derivable  from  the  modified  inte¬ 
grated  squared  error  procedure  but  we  do  not  present  it  here.  Still  another 
approach,  involving  a  generalization  of  Shannon's  information  or  likelihood 
produces  very  similar  estimators. 
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